Infinite swapping replica exchange molecular dynamics leads to 
a simple simulation patch using mixture potentials 
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Replica exchange molecular dynamics (REMD) becomes more efficient as the frequency of swap 
between the temperatures is increased. Recently in [Plattner et al. J. Chem. Phys. 135, 134111 
(2011)] a method was proposed to implement infinite swapping REMD in practice. Here we introduce 
a natural modification of this method that involves molecular dynamics simulations over a mixture 
potential. This modification is both simple to implement in practice and provides a better, energy 
based understanding of how to choose the temperatures in REMD to optimize efficiency. It also 
opens the door to generalizations of REMD in which the swaps involve other parameters than the 
temperature. 
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I. INTRODUCTION 



Replica exchange molecular dynamics (REMD) is one 
of the most popular methods to accelerate the confor- 
mational sampling of large biomolecules and other com- 
plex molecular systems^—. It can be viewed as a gen- 
eralization to molecular dynamics (MD) simulations of 
the replica exchange Monte Carlo method^—. The basic 
idea of REMD is to evolve concurrently several copies (or 
replica) of the system, and periodically swap their tem- 
peratures in a thermodynamically consistent way. When 
a replica feels an artificially high temperature, it explores 
its conformation space much faster than it would at the 
physical temperature; when it feels the physical temper- 
ature, equilibrium averages can be extracted from its dy- 
namics. In theory, the efficiency of REMD increases when 
the frequency of swaps is pushed up to infinity, but reach- 
ing this limit has proven difficult in practic e 12 ' 13 . Re- 
cently in Ref. [3 Plattner et al. have proposed a way to 
avoid this difficulty. The idea is to first establish analyt- 
ically what the limiting dynamics of REMD is at infinite 
swapping frequency, and then implement this dynamics 
directly instead of trying to increase the swapping fre- 
quency in the original REMD. Our main purpose here is 
to show that a natural reformulation of the technique of 
Ref. [l4] leads to a simple method in which the various 
replica in infinite swapping REMD evolve by standard 
MD over a new potential that is a temperature-dependent 
mixture of the original one which couples all the replica. 
This new method is simple to implement, and reduces to 
a patch of standard MD codes in which several replica of 
the system are evolved in parallel using forces that are 
the original molecular ones multiplied by factors that in- 
volve the energies of the replicas: these energies are the 
only quantities that must be communicated between the 
replicas as they evolve. The method gives a new per- 
spective on how to optimally choose the temperatures 
(including how many of them to pick), with implications 
for the original REMD, by analyzing simple geometrical 
characteristics of the mixture potential. As we show be- 



low, it also permits to design generalizations of REMD in 
which parameters other than the temperature are used 
to build the mixture potential. 



II. INFINITE SWAPPING REMD 

For the sake of simplicity we will consider first the case 
of a system governed by the overdamped Langevin equa- 
tion (the generalization to standard MD will be given in 
Sec. HI] below): 



x = f(x) + V2F 1 V, 



(1) 



where x € R 3n denotes the instantaneous position of the 
system with n particles, f(x) = — W(x) is the force 
associated with the potential V(x), (3 = 1/fcsT is the 
inverse temperature, r\ is a 3n-dimensional white-noise 
and we set the friction coefficient to one for simplicity. 
The solutions of ([1]) sample the Boltzmann equilibrium 
probability density, 



-/3V(as) 



(2) 



where Zp = J ffi3n e~@ v ( !B >dx. Assuming we only take two 
temperatures (the generalization to more temperatures 
is considered in Sec. IVII below) . the idea behind REMD 
is to replace (JXJ) by 



xi = f(xi) 



x 2 = f(x 2 ) + J2fc 1 (t) T] 2 



(3) 



where xi(t) and x 2 (t) are the two replica, and 
and P 2 (t) are the two temperatures that alternatively 
swap between the physical /3 and the artificial ft < j3 
(so that ksT > fcsT). These swaps are attempted 
with frequency v, and the ones from ((3i,f3 2 ) = ((3,j3) 
to (/3i,p2) = (f3,f3) are accepted with probability 



, Pp{xi)pp{x 2 ) 
mm [ ^— — r — - — -,1 



(4) 
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and similarly for the ones from (f3i,/3 2 ) = (A/3) to 
j^ 2 ) = (/3,/3). ((U) is the standard acceptance prob- 
ability used in Metropolis Monte Carlo schemes and it 
guarantees that ([3]) samples the following equilibrium 
probability distribution in (xi, /3i, x 2 , (3 2 )'- 



= PpAxi)pp 2 (x 2 ) {\5p lt pS ht p + \8p u p&fa,f)) 



(3) 



where Sp lj /3 denotes the Kronecker delta function, <5,g li( g = 
1 if P\ = (3 and Sp 1 ^p = otherwise. Summing over 

I 



(/3i, /3 2 ) then gives the equilibrium density for the replica 
positions alone, which is a symmetrized version of the 
Boltzmann densities at the two temperatures (3 and /3: 



,p(xi,x 2 ) = ±pp(xi)pp(x 2 ) + ^pp(xi)p p (x 2 ) (6) 



As a result the ensemble average at the physical temper- 
ature of any observable A(x) can be estimated from 



(A) p = / A(x) P[3 (x)dx 

(iOpp(x 1 ,X2)A(x 1 ) + UJp /3 (x 1 ,X2)A(x 2 ))Q /3i p{x 1 ,X2)dx 1 dX2 



(7) 



lim - / (u }0 ^(x 1 {t),x 2 (t))A(x 1 (t)) + L } ^{x 1 {t),X2(t))A(x 2 (t)))dt 

I -too 1 Jq 



where we defined the weight 



u n{x x ,x 2 ) 



(x^pgfa) 



Pp{xi)pp{x 2 ) + pp{x 2 )pp(xi) 

e -f3V( Xl )-t3V{ X2 ) 
e -/3V( Xl )-pV( X 2) _|_ e -fiV( Xl )-pV( X2 ) 



(8) 

The estimator ([7|) is slightly different from the one tradi- 
tionally used in REMD-, but its validity can be readily 
checked by inserting ^ and (|5J) in (J7J), and it will prove 
more convenient for our purpose. To quantify the effi- 
ciency of this estimator, notice that 

ujp^Aixi) + ojp fj A(x 2 )) 2 Qppdxidx 2 
; 2 / (u 2 ppA 2 (xi) +ujj_ l3 A 2 (x 2 ))g l3 j3dx 1 dx2 



servation made in Ref. [lj is that the limit v — > 00 can be 
taken explicitly. In this limit, the fast temperatures are 
adiabatically slaved to the positions of the slow replica, 
and these replica only feel their average effect. This leads 
to the following closed equation for the replica positions 
replacing ©: 



Xi 



x 2 = f{x 2 ) + J2(P 1 uj p ^ + P 1 w ( g i/3 )?7 2 



(10) 



Thus, simulating with (fTU)) instead of © is a concrete 
way to perform infinite swapping REMD, and several 
strategies to perform these simulations were discussed in 
Ref. [lj. Here we would like to take advantage of these 
strategies but simplify their implementation by modify- 
ing (|10p. How to do so is explained next. 



< 2 



(upjA 2 (xi) +ujp p A 2 (x2))Q,3.pdxidx 2 



4(A 2 ) P 



where lj 



(9) 

p,p = Up^{xi,x 2 ) and 
Qp p = Qp j}(xi, x 2 ), and we used the property that 
< Up p < 1. Therefore, the variance of the estima- 
tor J7]) is controlled by the variance of A(x) under the 
original density pp(x). This means that the efficiency 
of this estimator is determined by how fast the coupled 
system ((3]) converges towards equilibrium. As mentioned 
before, the higher v, the faster this convergence isi2r— , 
but large values of v requires one to make many swapping 
attempts, which slows down the simulations. The key ob- 



III. REFORMULATION 

The system (fTU)) is quite complicated to simulate be- 
cause it involves a multiplicative noise. Yet because 
this system satisfies detailed balance like the original 
REMD @ does, it has a specific structure that can be 
used to simplify it. To see how, note that the Fokker- 
Planck equation for the joint probability density of X\(t) 
and X2(t), g(t, Xi,X2), can be written as 

d t g(t) =diy(M(Q(t)gmdU + k B Tgradg(t))) (11) 

Here div and grad denote, respectively, the divergence 
and gradient operators with respect to (x±,x 2 ), and we 
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denned the mixture potential 



U(xx,x 2 ) 

= -k B Tln (e-PV(*i)-PV(<*>) + e -pv( Xl )-pv( X2 )^ 



as well as the tensor B = B(xi, X2): 







0_ 



(12) 



(13) 



It is easy to check that the stationary solution of ([HI) is 
g(t, xi,X2) — Qas(xx,X2), confirming that the limiting 
equation (fTU|) samples © like ([3]) does and therefore can 
be used in the estimator ([7]). The multiplicative nature 
of the noise in (|T0| is a direct consequence of the fact that 
the tensor B in (fTTj) depends on X\ and x 2 . Therefore, a 
natural simplification is to replace B by a constant ten- 
sor, which, for convenience, we will simply take to be the 
identity. This substitution does not affect the stationary 
solution of (fTTj) , but it changes the system of overdamped 
Langevin equations this Fokker-Planck equation is asso- 
ciated with. After some straightforward algebra, it is 
easy to see that this new system is 



±2 - (w PtP + /3- 1 Puj l3 ^)f(x 2 ) + v / 2/F I r7 2 . 



(14) 



This system of equations samples ([BJ like (fTTJj) does, and 
its solution can be used in the estimator ([7]). But in 
contrast with (fTUjl , the noise in (fTT]) is simply additive like 
in the original equation (JTJ. The only things that have 
changed in (fTT]) are the forces, which are the gradients 
with respect to Si and x 2 of the mixture potential ([12]). 
As can be seen from (fTT]) . these gradients involve the 
original forces, f(x\) and f(x 2 ), multiplied by scalar 
factors containing the weight ([8]). This means that the 
only quantities that must be communicated between the 
replicas are the potential energies V{x,\) and V(x 2 ) that 
enter this weight. 

In practice, rather than (UJ) one is typically interested 
in systems governed by the Langevin equation 



m p, 



p = fi x ) -ip + V 2 ! 171 / 3 n, 



(15) 



where m denotes the mass and 7 the friction coefficient, 
in which case the generalization of (ITT)) reads 



x\ = m Pxi 

Pi = [up,p + P~ X P u p~,p)f( x i)_ 
-IPi 



x 2 = m x p 2 



y /2jm/3- 1 r) 1 , 

P2 = (upjj + p- l fiup t p)f{x 2 ) 

~1P 2 + \/27m/3- 1 r/ 2 , 



(16) 



The solution of these equations can also be used in the 
estimator {7J| and they can be simulated in parallel us- 
ing a simple patch of a standard MD code since they 



too only involve the modification of the forces discussed 
above. The extension to molecular dynamics using other 
heat baths is straightforward. In the sequel, we will ana- 
lyze the performance of (|16p. test it on several examples, 
and generalize this system to situations with more than 
two temperatures and where other parameters than the 
temperature are used to build the mixture potential. 



IV. EFFICIENCY AND OPTIMAL CHOICE OF 
THE TEMPERATURE k B T 

The simple form of (fTT)) or (fTB"]) allows for a transpar- 
ent explanation why these equations are more efficient 
than the original (flj or (fTB"]) at sampling the equilib- 
rium density, and how to choose the artificial temper- 
ature ksT = /3 _1 to optimize this efficiency gain. To 
see this, consider a situation in which the original po- 
tential V(x) has a minimum of energy at x m . Then the 
mixture potential U(x\, x 2 ) has a minimum at (x m , x m ). 
with two channels connected to it along which this po- 
tential is a scaled down version of the original one, see 
the top panel of Fig. [JJfor an illustration. Indeed, if the 
artificial temperature in (|14p is much higher than the 
physical one, /3 <gc /3, then in the channel where x 2 w x m 
we have U(xi,x m ) pa (3V(xi) + V(x m ) in the re- 
gion where V(xi) > V(x m ), and similarly in the channel 
where x% ~ x m . Thus, along these channels, the equa- 
tion for X\ in (|16p can be approximated locally by 



xi = m 1 p 1 , 



(17) 



and similarly for x 2 . (|17p is like the original Langevin 
equation (fT5|) except that the force has been multiplied 
by a factor -C 1, meaning the energy barriers have 

been lowered by this same factor along the channels. In 
essence, by remaining close to a minimum of the energy, 
each replica helps the other to surmount barriers and ex- 
plore the landscape towards other minima, and this is 
what accelerates the sampling. Note however that, while 
a replica moves fast along a channel, its weight in the 
estimator is close zero whereas the one of the replica 
that hovers near x m is close to one. Indeed when X\ 
moves and x 2 « x m , we have Wq p(xi,x m ) ks and 
u>gg(xi,x m ) ~ 1, and similarly when x 2 moves and 
Xi « x m . Thus we really need both replica to move in 
succession, with one of them hovering near a minimum 
while the other explores the landscape and vice-versa, to 
achieve proper sampling. 

Concerning the choice of temperature, the form of (fTT)) 
suggests that the optimal fc^T = /3 _1 to pick is the high- 
est energy barrier that the system needs to surmount to 
explore its landscape: at lower values of fc^T, crossing 
this barrier is still a rare event, and at higher values, we 
start to blur the sampling by having the system visit re- 
gions of too high energies. As we illustrate next on exam- 
ples, this intuition is correct, except that entropic effects 



4 



also play an important role in high dimension and may 
slow down the sampling unless additional replicas with 
temperatures between fcgT and k B T are introduced (as 
will be done in Sec. IVI|) . 

To test (|14p and (JTHJ) and verify the results above, we 
first consider a system with potential 



V(x) = (1-x 2 ) 2 



(18) 



The mixture potential (fT2f associated with this V(x) is 
plotted in the top panel of Fig. [TJ which clearly shows 
the two channels mentioned before. The Bottom panel 
of Fig. [JJ shows a slice of the mixture potential along 
one of the channels and compares it with V(x) and its 
scaled-down version /3~ 1 /3V(x) when (3 — 25 (meaning 
that k B T — 0.04 and the energy barrier to escape the 
shallow well is about 20k bT at this physical tempera- 
ture) and /3 = 0.8. The top panel of Fig. [5] shows the 
times series of the original (UJ and the modified fl~4")) for 
these parameters values. While the solution of (UJ is 
stuck in one well, that of (TT41 explores the two wells effi- 
ciently. The middle panel of Fig. [5] shows the convergence 
rate of (fl"4"|) (estimated from the autocorrelation function 
of the position) as a function of /? and compares it to the 
analytical estimate of the rate obtained from (fTT|) in the 
high friction limit. This convergence rate reaches a max- 
imum when j3 = AT^ -1 m 0.8, consistent with the pre- 
diction from (fTT)) . Finally the bottom panel Fig. [5] shows 
the free energy reconstructed using (fT4]) with j3 = 0.8 
compared to the one obtained from the original (UJ with 
/3 = 25. 



THE IMPACT OF DIMENSIONALITY AND 
THE NEED FOR MORE THAN TWO 
TEMPERATURES 



V. 



As mentioned in Sec. IIV1 in high dimension entropic 
effects start to matter and slow down convergence unless 
more than two temperature are used. To analyze the 
impact of the dimensionality consider a system with D 
dimensions moving on the following potential 



V(x ,x 1 ,. . .,x D . 



i) = (l-^o) 2 -^o + 



D-l 1 
3=1 



(19) 



where Ai, A2, . . . , Xd-i are parameters controlling the 
curvature of the potential in the X\,x%,... ,xd-i direc- 
tions. In the original equation (|15p . the dynamics in the 
D directions are independent, but this is no longer the 
case for the limiting equation (|16p over the mixture po- 
tential. When the dimensionality is large, D 3> 1, it has 
the effect that the replica moving in the channel by (fTT)) 
seldom comes close to a local minimum of the potential 
because the basin around this minimum is quite wide; at 
the same time, it has to come close enough to one such 
minimum to allow the other replica to starts moving in 
a channel. As can be seen in Fig. [31 this introduces an 
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Figure 1. Top panel: The mixture potential (|12[) for the po- 
tential (|18p clearly showing the two channel (in dark blue) 
connected to the minimum. Bottom panel: A slice of the 
potential along one of the channel (blue solid curve), com- 
pared with the original potential V (red dashed curve) and 
its scaled-down version (j3//3)V (black dash-dotted curve). 



additional slow time scale in the system when D is large, 
which is related to the presence of an entropic barrier 
in the mixture potential. This is shown in Fig. U by 
plotting the free energy G{E\, E 2 ) of the mixture poten- 
tial U(xi,X2) by using the potential energies of the two 
replica as collective variables: 



G(E 1 ,E 2 ) = -k B T\n [ 



-0U(SB1,X 2 ) 



<R 3 " 



x S(V( Xl ) - E 1 )S(V(x 2 ) - E 2 ) dsci dx 2 

(20) 

We can estimate the additional slow time scale to 
switch from one channel to the other by calculating the 
mean time the replica moving by (|17l) takes to come 
within a region near the local minimum where its po- 
tential energy is about yfe^T above that of the energy 
minimum. When this event occurs, the other replica has 
a chance to go in his channel and start moving instead, 
since ^k B T is the typical potential energy of the system 
under physical temperature T by equipartition of energy. 
However this event becomes less and less likely as the di- 
mensionality increases because the replica moving by (|17p 
effectively feels the rescaled potential (3~ 1 j3V(x) instead 
of the original one, and so its potential energy tends to 
be of order ^k B T rather than ^k B T. Assume that 
k B T = is low enough that we can take a quadratic 
approximation of the potential near the local minimum, 
V(x) w V(x m ) + |(x - x m ) T H(x - x m ). The region 
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Figure 2. Replica exchange overdamped dynamics for V(x) = 
(x 2 - l) 2 - \x. The physical temperature is T = = 0.04 
and the auxiliary high temperature is chosen to be f = /3 _1 = 
1.25, the barrier size. The simulation time is T to t = 10 5 with 
time step dt = 0.025. Top panel: A typical trajectory (blue) 
of xi (t) of the system (|14[1 hops between both wells frequently, 
while a typical trajectory (red) under the physical tempera- 
ture will stay in one of the two wells, as the transition is very 
rare. Middle panel: The convergence rate of the REMD for 
overdamped dynamics (114[) with /3 = 25 and different choices 
of p. The blue solid crosses show the numerical result, the 
black dash-dotted curve is the estimate obtained from (|17[) in 
the high friction limit. Bottom panel: The exact free energy 
(gray solid curves), that estimated by (|14|) (blue solid curve) 
and that estimated by JT]) (red solid curve, shifted up by 0.1 
to better illustrate the results). 



that the moving replica needs to hit is bounded by the 
ellipsoid defined by \(x — x m ) T H(x - x m ) = ^ksT, 
and we can use transition state theory to estimate the 
mean frequency at which the system governed by (fTT)) 
hits this ellipsoid: 



(deti^/^^VWfJ 



D/2 



-P//3 



VH, (21) 



where D 
\x T Hx -- 



- 3n and an is the surface area of the ellipsoid 
1. Using Carlson's bouncU^ for an, we obtain 




Figure 3. Replica exchange dynamics (|14[1 for the potential 
(fT9|) with D = 10, = 25 and ft = 1. Top two panels: Typical 
trajectories of xo for the two replica. Middle panel: Typical 
trajectories of energies for the two replica. Bottom panel: 
Corresponding weight factor CO a p cLS El function of t. The 
system switches between the two channels as w» s switches 
value between and 1. This introduces an additional slow 
time scale to the system. 



an upper bound 



v < 



d 1 / 2 {2tt) d /A_/^r 

T((D + 1)/2)V 



D/2 



(22) 



where A is the mean curvature of the potential well. The 
frequency v also gives the mean rate at which the two 
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Figure 4. The mixture potential plotted using the energies 
of the two replica as coarse grained variables. The entropic 
barrier at E\ = E2 introduces a slow time scale for switching 
between channels. 



replica switch from moving fast in the channels or re- 
maining trapped near a minimum. Fig. [5] shows the con- 
vergence rate of ([M)) (estimated from the autocorrelation 
function of the position) for the potential flU]) and shows 
that this rate is indeed dominated by the mean hitting 
frequency in ([22]) when D is large (D = 10 for the re- 
sults reported in the figure: D = 3n for system (fT5|)). To 
avoid this slowing down effect, more than two tempera- 
ture must be used, as explained next. 
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Figure 5. The convergence rate of the REMD for overdamped 
dynamics with D = 10 with /3 = 25 and different choices of 
/3. The blue solid crosses show the numerical result, the black 
dash-dotted curve is the upper bound (|22[) obtained from the 
inverse of mean hitting time of (1171) to a small ball around the 
local minimum of the potential where the energy is of order 
fcflT from that of this minimum 



VI. USING MULTIPLE TEMPERATURES 

The discussion in Sec. [V] indicates the need to take 
more than two temperatures to accelerate convergence 
for high dimensional systems. If we use N temperatures 
from the physical ksT to the optimal fceT, i.e. 



Pi=P = 



1 



> /3 2 > • • • > Pn = P = 



1 



kef 1 



(23) 



then (IT21 generalizes into the following mixture potential 
constructed by symmetrization over the Nl permutations 
of the N temperatures among the TV replicas: 

f7(xi, . . .,x N ) 

= -/j B Tln^e~ fty(x -< 1 > ) '''~^™ y(av < JV > ) ( 24 ) 

a 

where ^ CT denotes the sum over all the permutations a 
of the indices {1, 2, . . . , N}. In turn, the system (fT6|) 
becomes 



Xj = m '-pj, 



(25) 



where j = 1, . . . , N and t]j are independent white- noises. 
Here we defined 

Rj = P~~ 1 ^2Pa( 3 )U tT (l),...,<T(N)(Xl,. ■ ■ ,X N ) (26) 



with 



w CT (i),... )(7 (iv)(aJi, • • • ,x N ) 

e -/3iV(av (1) ) l3 N V(x a{N) ) 



(27) 



E 



-/8 1 V(m <r , w )...-/3wV(av w ) 



If the temperatures in (|23j) are far apart, then at any 
given time there typically is one specific permutation a* 
such that Wo-«(i),...,<7»(iv)(sci> • • • , ^w) ~ 1 whereas these 
weights are close to zero for all the other permutations. 
This is the multiple replicas equivalent of the slow switch 
phenomenon between ujp g and ujg @ being alternatively 
1 or that we observed in Sec. |V| with two replicas and 
it means that 



R i 



*(J) 



< 1, 



(28) 



i.e. all the forces in (|23|) are rescaled by factors that are 
less or equal to 1. Up to relabeling of the replicas, we can 
always assume temporarily that cr*(j) = j, meaning that 
the factors Rj are ordered as 1 = R% > R2 > ■ ■ ■ > Rn- 
The most likely way for these factors to change order is 
that one of the j-th replica hits a small ball where its 
potential energy becomes of order ksTj_\. again this is 
the multiple replica equivalent of the channel switching 
process that we observed in Sec. [V] with two replicas. 
When this process occurs, the permutation a* for which 
the weight is approximately one becomes that in which 
the indices j — 1 and j have been permuted. The fre- 
quencies Vj at which these swaps occur can be estimated 
as in Sec. [V] (compare (1221)): 




D/2 



(29) 



This estimate suggests that we should take a geometric 
progression of temperatures in which their successive ra- 
tio is kept constant in order for all the Vj (and hence the 
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time scales of channel switching) to be of the same order: 



3 + 1 

ft 



1/(N-1) 



3 = 1, 



,iV-l 



(30) 



This choice agrees with the conventional choice in the 
literature (see e.g. discussions in Refs. (l7l - [20j ) but gives 
a different perspective on it. 

The discussion above also indicates how many replicas 
should be used. Specifically, one should aim at elimi- 
nating the slow time scale of channel switching by tak- 
ing the successive temperature sufficiently close together: 
clearly, in (|30|) the higher N, the closer to 1 the ratio 
ft+i/ft becomes even if (3/j3 is very small. However, 
this may require taking many replicas, which in practice 
poses a difficulty for our approach because the number 
AM of terms involved in the weight (f2~T)) grows very fast 
with N. 

Several strategies can be used to alleviate this problem. 
For example, one can decrease the effective dimensional- 
ity of the system by only raising the temperature of a 
few important degrees of freedom in the system. This 
idea was implemented e.g. in Ref. [2l] for biomolecular 
simulations in solvent. 

Another strategy, originally proposed in Ref. 14 is to 
perform partial swapping. Instead of symmetrizing the 
potential over the whole set of the N replicas associated 
with the N temperatures, the idea is to divide them into 
several groups consisting a moderate number of replicas. 
In each group, a mixture potential like (|2"4")l with N re- 
placed by the number of replica in the group is used to 
evolve the system. To fully mix the temperatures, mul- 
tiple partitions are used to distribute the replica in the 
different groups and the temperatures in the groups of 
each partition are reassigned dynamically. While none 
of partition by itself will fully mix every replica, com- 
bining these partitions together permits to achieve a full 
mixture of the N replica. 

To simplify the presentation of the algorithm, let us 
consider the case N = 3, with the two partitions given 
by A = (12) (3) and B = (1)(23); the extension to the 
general case is straightforward. We denote at (t) the tem- 
perature assigned to the z-th replica, which takes value 
in /3i = /3,/?2 and /?3 = (3. At the start of the simula- 
tion, we set «i(0) = /3i. We then evolve the system using 
the two partitions A and B alternatively and dynami- 
cally reassign the temperatures as we switch between the 
two partitions. This is done by repeating the following 
procedures which evolve the system from time t to time 
t + 2 At: 

1. Evolve the system using partition A from t to t+At: 
The group of replica 1 and 2 is evolved using (|16|) 
with the mixture potential 

U ai[t)M2{t) {x 1: x 2 ) = -p-Hn(e- a ^ v ^- a ^ v ^ 

_|_ e -a 2 (t)V{x 1 )-u 1 (t)V(x 2 )y 




As the other group only consists of replica 3, it 
is evolved under scaled potential a 3 (t)l3^ 1 V(x 3 ) 
which is just the mixture potential with only one 
replica. 

2. At time t + At, reassign the temperatures within 
each group in partition A: We set 

At)=ai{t) 
At)=a 2 (t) 

with probability uj ai ( t ),a 2 (t)( x i-, x 2) and 

(a x (t + At) =a 2 (t) 
\a 2 (t + At) = ai(t) 

with probability uj a2(t ^ ai(t) (x 1 ,x 2 ) = 1 - 
UJ a 1 (t).a 2 (t)( x i-i x 2)- Hence, one particular assign- 
ment is chosen for each group from the symmetriza- 
tion. 

3. Repeat the above two steps to evolve partition B 
from t+At to t+2At. The replica 1 is evolved under 
the potential ax(t + At)f3^ 1 V(x 1 ). The group of 
replica 2 and 3 is evolved with the mixture potential 

U a2 (t+At),u 3 {t+At){ x 2, x 3) 

: ln ^ e -a2(t+At)V(x 2 )-a 3 (t+At)V(x3) (32) 



+e" 



-a 2 (t+At)V(x 3 )-a 3 (t+At)V(x 2 )\ 



4. At time t + 2At, reassign the temperatures within 
each group in partition B: We set 




2 At) = a 2 (f • 
2At) = a 3 (t 



At) 
At) 



with probability u> a2 (t+At) : a 3 (t+At){ x 2, x 3) and 

ia 2 (t + 2At) = a 3 (t + At) 
\a 3 (t + 2At) = a 2 (t + At) 

with probability u a3 ( t +At),a 2 (t+At)( x 2, x 3) = 1 - 

UJ a 2 (t+At),a 3 {t+At)( x 2, x 3)- 

This partial mixing strategy can be viewed as a general- 
ization of the usual replica exchange molecular dynamics 
in which several replica are grouped together and evolved 
under a mixture potential. The parameter At is analo- 
gous to the inverse of swapping attempt frequency in the 
conventional replica exchange. Therefore, it is more ad- 
vantageous to take a small At to increase the swapping 
frequency. In practice, we can take At equal to the time- 
step used in the MD simulations. 

The performance of the partial swapping algorithm is 
illustrated in Fig. [6] with three temperatures f3\ = 25, 
/?2 = 5 and f3 3 = 1. Compared with Fig. [3] the slow time 
scale of switching between channels is now removed due 
to introduction of an intermediate temperature. 



2 4 6 8 10 
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Figure 6. Partial swapping REMD for the potential (|19[) with 
Pi = 25, P% — 5 and P% = 1. A typical trajectory for one of 
the replica is plotted. 



VII. LENNARD-JONES EXAMPLE 



x iu 

Figure 7. Molecular dynamics example. A typical trajectory 
of the pair distance between the pair of particles interact with 
double well potential in the molecular dynamics simulation 
under physical temperature (red) stays in the compressed and 
elongated states for a long time, while the trajectory under 
the dynamics (|16|) exhibits frequent transitions between the 
compressed and elongated states. 



Finally, to test the performance of our algorithm on a 
more realistic example, we apply (HHJ) to the model sys- 
tem proposed in Ref. [22|. This system consists of N two- 
dimensional particles in a periodic box with side length I. 
All particles have the same mass m, and they interact 
with each other with the Weeks-Chandler-Anderson po- 
tential defined as 

FwCA(r) = 4e((a/r) 12 - (a/r) 6 ) + e, (33) 

ifr < rwcA = 2 1/6 <t, and VwcaM = otherwise, except 
for a pair of particles, which interact via a double well 
potential 

V dw {r) = h[l — 2 J . (34) 

We take N = 16, I = 4.4, a = 1, h = 1, w = 0.5, and e = 
1 in the simulation. The physical temperature is T = 0.2, 
and for this system, it turns out that it suffices to use 
two temperatures with the auxiliary high temperature 
chosen to be T = 1. The quantity of interest is the free 
energy associated with the distance of the pair of particles 
interacting via the double well potential. The trajectories 
of the pair distance is shown in Figure compared to a 
direct molecular dynamics simulation. While the original 
dynamics exhibits metastability in switching between the 
compressed and elongated states of the pair distance, it 
is observed that the dynamics on the mixture potential 
efficiently sample the configurational space. 



VIII. GENERALIZATIONS 

We have used the mixture potential to mix two tem- 
peratures in the above discussion but the idea extends 
naturally to mixture based on other parameters. For 
example, we can mix the original potential with a modi- 
fied one in which the barriers between metastable regions 
is reduced. Such a modified potential may come from 



e.g. spatial warping^ 3 - or solute tempering 21 . If we de- 
note by V the auxiliary modified potential, the dynamics 
is given by 

' xi = m Pi, 

Pi = (u v ,vf(xi) + uj vv f(xi)) 

-1Pi + V^W^m Vi(t), / 35 s 

± 2 = rn~ 1 p 2 , 

P2 = i^V,V~f( x 2) + Uy t yf(x2)) 

-IP2 + V^fF^m r) 2 (t), 

where / and / are the forces corresponding to the poten- 
tials V and V respectively, and the weight uj v v is given 
by 

U) Vj y(xi,X 2 ) = e _ (3y(a , l) „ /3 v' (a , 2) +e -pv( Xl )-[3V( X2 ) 

(36) 

and similarly for uj v v . The performance (|35|) is illus- 
trated on a double-well example in Figure IU here the 
modified potential V is simply the original one in which 
we have removed the barrier. 

IX. CONCLUDING REMARKS 

We have presented a natural reformulation of the in- 
finite swapping limit of REMD that enables a simple 
implementation in which forces in standard MD simula- 
tions are rescaled by factors involving the energies of all 
the replica. This reformulation is equivalent to having 
the system evolve over a mixture potential, and thereby 
permits to analyze the efficiency of REMD by using fa- 
miliar tools like Arrhenius formula and transition state 
theory. It also gives us insights on how to choose an opti- 
mal sequence of temperatures in REMD. Finally, it leads 
naturally to generalizations in which the mixture poten- 
tial is constructed by varying parameters in the potential 
other than the temperature, like for example those used 
in spatial warping 23 or solute tempering 21 . 
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Figure 8. Numerical results for the dynamics (|35l) . Top panel: 
The physical double well potential Vix) — (x 2 — l) 2 /4 (blue) 
and the auxiliary potential V (gray) where the barrier is re- 
moved. The other parameters in the model are m = 1, 7 = 1, 
j3 = 100, Ttot = le5 and At = 0.1. Middle panel: A typi- 
cal trajectory of X\(t) of the dynamics (|35|) . Since V has no 
barrier, the dynamics efficiently explore the region between 
the two local minima of the original potential. Bottom panel: 
Comparison of the estimated (blue) and exact (gray) free en- 
ergies. 



Appendix A: Dynamics based on Mixture 
Hamiltonian 

The formulation of the method presented in text in- 
volves a mixture potential, but this mixing can be done 
on the level of Hamiltonians too. We briefly describe this 
alternative in this appendix. Consider a mixture Hamil- 
tonian of two replica 

H( Xl , Pl ,x 2 ,p 2 ) = -k B T\n ^-^-0E 2+e -0 El -0E^ 

(Al) 

where E\ — E(xi,p 1 ) and E 2 = E(x 2 ,p 2 ). The equa- 
tions of motion associated with this Hamiltonian are 

'xi = m -1 ^^ + p-t-fiujp ^)p x , 
Pi = (up,p + P~ l Pup,[j)f{xi) 

x 2 =m- 1 (ujpj j +(3- 1 (3uj ljt p)p 2 , 

-IP2 + \/2rm^ =I ri 2 . 
where the weight functions ^ are given by 

u>p iP (x uPl1 x 2 ,p 2 ) = e _ m _p E2 + e _ MEl ■ (A3) 

Observe that in the mixture Hamiltonian dynamics, both 
equations for x and p have rescaling terms depending on 
the weight s uj^ s. Thus, compared with (1161) . the dy- 
namics in (IA2[) mixes together both the kinetic and the 
potential energies of the two replicas. As a consequence, 
in high dimension, the switching of uj^ ^ from to 1 and 
vice-versa will be further slowed down. This suggests 
that it is more advantageous to use (|T5)) rather than (| A2|) . 



* jianfeng@math.duke.edu 
t eve2@cims.nyu.edu 

1 U. H. E. Hansmann, Chem. Phys. Lett. 281, 140 (1997). 

2 Y. Sugita and Y. Okamoto, Chem. Phys. Lett. 314, 141 
(1999). 

3 Y. Sugita, A. Kitao, and Y. Okamoto, J. Chem. Phys. 
113, 6042 (2000). 

4 F. Rao and A. Caflisch, J. Chem. Phys. 119, 4035 (2003). 

5 D. J. Earl and W. Deem, Phys. Chem. Chem. Phys. 7, 
3910 (2005). 

6 R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 57, 2607 
(1986). 

7 C. J. Geyer, in Computing Science and Statistics: Proc. 
23rd Symposium on the Interface, edited by E. M. Kerami- 
das (Interface Foundation, Fairfax Station, VA, 1991) pp. 
156-163. 



C. J. Geyer and E. Thompson, J. Amer. Statist. Assoc. 90, 
909 (1995). 

9 K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 
(1996). 

10 E. Marinari, G. Parisi, and J. J. Ruiz-Lorenzo, in Spin 
Glasses and Random Fields, edited by A. P. Young (World 
Scientific, Singapore, 1998) pp. 59-98. 

11 M. C. Tesi, E. J. Janse van Rensburg, E. Orlandini, and 
S. G. Whittington, J. Stat. Phys. 82, 155 (1996). 

12 D. Sindhikara, Y. Meng, and A. E. Roitberg, J. Chem. 
Phys. 128, 024103 (2008). 

13 M. J. Abraham and J. E. Gready, J. Chem. Theory Corn- 
put. 4, 1119 (2008). 

14 N. Plattner, J. D. Doll, P. Dupuis, H. Wang, Y. Liu, and 
J. E. Gubernatis, J. Chem. Phys. 135, 134111 (2011). 



10 



P. Dupuis, Y. Liu, N. Plattner, and J. D. Doll, Multiscale 
Model. Simul. 10, 986 (2012). 

B. C. Carlson, Proc. Amer. Math. Soc. 17, 32 (1966). 

D. A. Kofke, J. Chem. Phys. 117, 6911 (2002). 

N. Rathore, M. Chopra, and J. J. de Pablo, J. Chem. 
Phys. 122, 024111 (2005). 

E. Rosta and G. Hummer, J. Chem. Phys. 131, 165102 
(2009). 



R. Denschlag, M. Lingenheil, and P. Tavan, Chem. Phys. 
Lett. 473, 193 (2009). 

P. Liu, B. Kim, R. A. Friesner, and B. J. Berne, Proc. 
Natl. Acad. Sci. USA 102, 13749 (2005). 
C. Dellago, P. G. Bolhuis, and D. Chandler, J. Chem. 
Phys. 110, 6617 (1999). 

Z. Zhu, M. Tuckerman, S. Samuelson, and G. Martyna, 
Phys. Rev. Lett. 88, 100201 (2002). 



Equilibrium probability density 




Pair distance 




Reaction rate 




0.1 0.2 0.3 0.4 



Reaction rate 



10 




Pair distance 





200 400 600 800 1000 



Probability density of potential energy 




Probability density of pair distance 




